Regional plot¶
load gwaslab¶
In [1]:
Copied!
import sys
sys.path.insert(0,"/Users/he/work/gwaslab/src")
import gwaslab as gl
import sys
sys.path.insert(0,"/Users/he/work/gwaslab/src")
import gwaslab as gl
download sample data¶
In [2]:
Copied!
!wget -O t2d_bbj.txt.gz http://jenger.riken.jp/14/
!wget -O t2d_bbj.txt.gz http://jenger.riken.jp/14/
--2022-10-20 22:22:11-- http://jenger.riken.jp/14/ Resolving jenger.riken.jp (jenger.riken.jp)... 134.160.84.25 Connecting to jenger.riken.jp (jenger.riken.jp)|134.160.84.25|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 274187574 (261M) [text/plain] Saving to: ‘t2d_bbj.txt.gz’ t2d_bbj.txt.gz 100%[===================>] 261.49M 10.6MB/s in 25s 2022-10-20 22:22:36 (10.6 MB/s) - ‘t2d_bbj.txt.gz’ saved [274187574/274187574]
load sumstats into gwaslab.Sumstats¶
In [2]:
Copied!
mysumstats = gl.Sumstats("t2d_bbj.txt.gz",
snpid="SNP",
chrom="CHR",
pos="POS",
ea="ALT",
nea="REF",
neaf="Frq",
beta="BETA",
se="SE",
p="P",
direction="Dir",
n="N")
mysumstats = gl.Sumstats("t2d_bbj.txt.gz",
snpid="SNP",
chrom="CHR",
pos="POS",
ea="ALT",
nea="REF",
neaf="Frq",
beta="BETA",
se="SE",
p="P",
direction="Dir",
n="N")
Fri Nov 4 12:13:56 2022 Start to initiate from file :t2d_bbj.txt.gz Fri Nov 4 12:14:26 2022 -Reading columns : SNP,BETA,SE,CHR,REF,ALT,Frq,P,N,POS,Dir Fri Nov 4 12:14:26 2022 -Renaming columns to : SNPID,BETA,SE,CHR,NEA,EA,EAF,P,N,POS,DIRECTION Fri Nov 4 12:14:26 2022 -Current dataframe shape : Rows 12557761 x 11 Columns Fri Nov 4 12:14:32 2022 -Initiating a status column ... Fri Nov 4 12:14:39 2022 -NEAF is specified... Fri Nov 4 12:14:39 2022 -Checking if 0<= NEAF <=1 ... Fri Nov 4 12:14:43 2022 -Converted NEAF to EAF. Fri Nov 4 12:14:43 2022 -Removed 0 variants with bad NEAF. Fri Nov 4 12:14:43 2022 -Reordering columns to : SNPID,CHR,POS,EA,NEA,EAF,BETA,SE,P,N,DIRECTION,STATUS Fri Nov 4 12:14:47 2022 Finished loading data successfully!
a quick mmq plot¶
In [3]:
Copied!
mysumstats.plot_mqq(skip=3,cut=20, mode="mqq",stratified=True)
mysumstats.plot_mqq(skip=3,cut=20, mode="mqq",stratified=True)
Fri Nov 4 12:14:47 2022 Start to plot manhattan/qq plot with the following basic settings: Fri Nov 4 12:14:47 2022 -Genome-wide significance level is set to 5e-08 ... Fri Nov 4 12:14:47 2022 -Raw input contains 12557761 variants... Fri Nov 4 12:14:47 2022 -Plot layout mode is : mqq Fri Nov 4 12:14:56 2022 Finished loading specified columns from the sumstats. Fri Nov 4 12:14:56 2022 Start conversion and sanity check: Fri Nov 4 12:14:59 2022 -Removed 328791 variants with nan in CHR or POS column ... Fri Nov 4 12:15:00 2022 -Removed 0 variants with nan in EAF column ... Fri Nov 4 12:15:01 2022 -Removed 0 variants with nan in P column ... Fri Nov 4 12:15:01 2022 -P values are being converted to -log10(P)... Fri Nov 4 12:15:01 2022 -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed... Fri Nov 4 12:15:03 2022 -Sanity check: 0 na/inf/-inf variants will be removed... Fri Nov 4 12:15:07 2022 -Maximum -log10(P) values is 167.58838029403677 . Fri Nov 4 12:15:07 2022 -Minus log10(P) values above 20 will be shrunk with a shrinkage factor of 10... Fri Nov 4 12:15:07 2022 Finished data conversion and sanity check. Fri Nov 4 12:15:08 2022 Start to create manhattan plot with 87571 variants: Fri Nov 4 12:15:09 2022 -Found 84 significant variants with a sliding window size of 500 kb... Fri Nov 4 12:15:09 2022 Finished creating Manhattan plot successfully! Fri Nov 4 12:15:09 2022 -Skip annotating Fri Nov 4 12:15:09 2022 Start to create QQ plot with 87571 variants: Fri Nov 4 12:15:11 2022 -Calculating GC using P : 1.2139048028292598 Fri Nov 4 12:15:11 2022 Finished creating QQ plot successfully!
Out[3]:
(<Figure size 3000x1000 with 2 Axes>, <gwaslab.Log.Log at 0x7fcd214a0a30>)
filter and basic check¶
In [5]:
Copied!
mysumstats.filter_in(eq={"CHR":"7"},inplace=True)
mysumstats.basic_check()
mysumstats.filter_in(eq={"CHR":"7"},inplace=True)
mysumstats.basic_check()
Fri Nov 4 12:15:32 2022 Start filtering values: Fri Nov 4 12:15:33 2022 -Keeping 707780 variants with CHR = 7 ... Fri Nov 4 12:15:34 2022 Finished filtering values. Fri Nov 4 12:15:34 2022 Start to check IDs... Fri Nov 4 12:15:34 2022 -Current Dataframe shape : 707780 x 12 Fri Nov 4 12:15:34 2022 -Checking if SNPID is chr:pos:ref:alt...(separator: - ,: , _) Fri Nov 4 12:15:36 2022 Finished checking IDs successfully! Fri Nov 4 12:15:36 2022 Start to fix chromosome notation... Fri Nov 4 12:15:36 2022 -Current Dataframe shape : 707780 x 12 Fri Nov 4 12:15:39 2022 -All CHR are already fixed... Fri Nov 4 12:15:41 2022 Finished fixing chromosome notation successfully! Fri Nov 4 12:15:41 2022 Start to fix basepair positions... Fri Nov 4 12:15:41 2022 -Current Dataframe shape : 707780 x 12 Fri Nov 4 12:15:43 2022 -Position upper_bound is: 250,000,000 Fri Nov 4 12:15:46 2022 -Remove outliers: 0 Fri Nov 4 12:15:46 2022 -Converted all position to datatype Int64. Fri Nov 4 12:15:46 2022 Finished fixing basepair position successfully! Fri Nov 4 12:15:46 2022 Start to fix alleles... Fri Nov 4 12:15:46 2022 -Current Dataframe shape : 707780 x 12 Fri Nov 4 12:15:46 2022 -Detected 0 variants with alleles that contain bases other than A/C/T/G . Fri Nov 4 12:15:46 2022 -Converted all bases to string datatype and UPPERCASE. Fri Nov 4 12:15:48 2022 Finished fixing allele successfully! Fri Nov 4 12:15:48 2022 Start sanity check for statistics ... Fri Nov 4 12:15:48 2022 -Current Dataframe shape : 707780 x 12 Fri Nov 4 12:15:48 2022 -Checking if 0 <=N<= inf ... Fri Nov 4 12:15:49 2022 -Removed 0 variants with bad N. Fri Nov 4 12:15:49 2022 -Checking if 0 <=EAF<= 1 ... Fri Nov 4 12:15:49 2022 -Removed 0 variants with bad EAF. Fri Nov 4 12:15:49 2022 -Checking if 5 <=MAC<= inf ... Fri Nov 4 12:15:49 2022 -Removed 0 variants with bad MAC. Fri Nov 4 12:15:49 2022 -Checking if 5e-300 <= P <= 1 ... Fri Nov 4 12:15:49 2022 -Removed 0 variants with bad P. Fri Nov 4 12:15:49 2022 -Checking if -10 <BETA)< 10 ... Fri Nov 4 12:15:49 2022 -Removed 0 variants with bad BETA. Fri Nov 4 12:15:49 2022 -Checking if 0 <SE< inf ... Fri Nov 4 12:15:49 2022 -Removed 0 variants with bad SE. Fri Nov 4 12:15:49 2022 -Checking STATUS... Fri Nov 4 12:15:50 2022 -Coverting STAUTUS to category. Fri Nov 4 12:15:50 2022 -Removed 0 variants with bad statistics in total. Fri Nov 4 12:15:50 2022 Finished sanity check successfully! Fri Nov 4 12:15:50 2022 Start to normalize variants... Fri Nov 4 12:15:50 2022 -Current Dataframe shape : 707780 x 12 Fri Nov 4 12:15:51 2022 -No available variants to normalize.. Fri Nov 4 12:15:51 2022 Finished normalizing variants successfully!
In [6]:
Copied!
mysumstats.plot_mqq(mode="m",anno="GENENAME",anno_source="ensembl",build="19")
mysumstats.plot_mqq(mode="m",anno="GENENAME",anno_source="ensembl",build="19")
Fri Nov 4 12:15:51 2022 Start to plot manhattan/qq plot with the following basic settings: Fri Nov 4 12:15:51 2022 -Genome-wide significance level is set to 5e-08 ... Fri Nov 4 12:15:51 2022 -Raw input contains 707780 variants... Fri Nov 4 12:15:51 2022 -Plot layout mode is : m Fri Nov 4 12:15:51 2022 Finished loading specified columns from the sumstats. Fri Nov 4 12:15:51 2022 Start conversion and sanity check: Fri Nov 4 12:15:51 2022 -Removed 0 variants with nan in CHR or POS column ... Fri Nov 4 12:15:51 2022 -Removed 0 variants with nan in P column ... Fri Nov 4 12:15:51 2022 -P values are being converted to -log10(P)... Fri Nov 4 12:15:51 2022 -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed... Fri Nov 4 12:15:51 2022 -Sanity check: 0 na/inf/-inf variants will be removed... Fri Nov 4 12:15:51 2022 -Maximum -log10(P) values is 73.38711023071251 . Fri Nov 4 12:15:51 2022 Finished data conversion and sanity check. Fri Nov 4 12:15:53 2022 Start to create manhattan plot with 707780 variants: Fri Nov 4 12:15:56 2022 -Found 7 significant variants with a sliding window size of 500 kb... Fri Nov 4 12:15:56 2022 Start to annotate variants with nearest gene name(s)... Fri Nov 4 12:15:56 2022 -Assigning Gene name using built-in Ensembl Release 75 (hg19) Fri Nov 4 12:15:56 2022 Finished annotating variants with nearest gene name(s) successfully! Fri Nov 4 12:15:56 2022 Finished creating Manhattan plot successfully! Fri Nov 4 12:15:56 2022 -Annotating using column GENENAME...
Out[6]:
(<Figure size 3000x1000 with 1 Axes>, <gwaslab.Log.Log at 0x7fcd214a0a30>)
In [7]:
Copied!
mysumstats.get_lead()
mysumstats.get_lead()
Fri Nov 4 12:16:01 2022 Start to extract lead variants... Fri Nov 4 12:16:01 2022 -Processing 707780 variants... Fri Nov 4 12:16:01 2022 -Significance threshold : 5e-08 Fri Nov 4 12:16:01 2022 -Sliding window size: 500 kb Fri Nov 4 12:16:01 2022 -Found 1077 significant variants in total... Fri Nov 4 12:16:01 2022 -Identified 7 lead variants! Fri Nov 4 12:16:01 2022 Finished extracting lead variants successfully!
Out[7]:
| SNPID | CHR | POS | EA | NEA | EAF | BETA | SE | P | N | DIRECTION | STATUS | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5474489 | 7:13888699_G_C | 7 | 13888699 | G | C | 0.5680 | 0.0562 | 0.0094 | 2.507000e-09 | 191764 | ++++ | 9960099 |
| 5480067 | 7:14898282_C_T | 7 | 14898282 | C | T | 0.6012 | 0.0617 | 0.0088 | 2.336000e-12 | 191764 | ++++ | 9960099 |
| 5626346 | 7:44174857_T_G | 7 | 44174857 | G | T | 0.5985 | -0.0640 | 0.0093 | 5.325000e-12 | 191764 | ---- | 9960099 |
| 5732279 | 7:69406661_A_T | 7 | 69406661 | T | A | 0.1981 | -0.0900 | 0.0111 | 4.871000e-16 | 191764 | ---- | 9960099 |
| 5965364 | 7:127253550_C_T | 7 | 127253550 | C | T | 0.9081 | 0.2761 | 0.0152 | 4.101000e-74 | 191764 | ++++ | 9960099 |
| 5976830 | 7:130025713_G_A | 7 | 130025713 | G | A | 0.9530 | -0.1365 | 0.0230 | 3.068000e-09 | 191764 | ---- | 9960099 |
| 6092347 | 7:157038803_A_G | 7 | 157038803 | G | A | 0.4626 | -0.0502 | 0.0088 | 1.127000e-08 | 191764 | ---- | 9960099 |
In [8]:
Copied!
mysumstats.get_lead(anno=True)
mysumstats.get_lead(anno=True)
Fri Nov 4 12:16:01 2022 Start to extract lead variants... Fri Nov 4 12:16:01 2022 -Processing 707780 variants... Fri Nov 4 12:16:01 2022 -Significance threshold : 5e-08 Fri Nov 4 12:16:01 2022 -Sliding window size: 500 kb Fri Nov 4 12:16:01 2022 -Found 1077 significant variants in total... Fri Nov 4 12:16:02 2022 -Identified 7 lead variants! Fri Nov 4 12:16:02 2022 Start to annotate variants with nearest gene name(s)... Fri Nov 4 12:16:02 2022 -Assigning Gene name using built-in Ensembl Release 75 (hg19) Fri Nov 4 12:16:02 2022 Finished annotating variants with nearest gene name(s) successfully! Fri Nov 4 12:16:02 2022 Finished extracting lead variants successfully!
Out[8]:
| SNPID | CHR | POS | EA | NEA | EAF | BETA | SE | P | N | DIRECTION | STATUS | LOCATION | GENE | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5474489 | 7:13888699_G_C | 7 | 13888699 | G | C | 0.5680 | 0.0562 | 0.0094 | 2.507000e-09 | 191764 | ++++ | 9960099 | 42154 | ETV1 |
| 5480067 | 7:14898282_C_T | 7 | 14898282 | C | T | 0.6012 | 0.0617 | 0.0088 | 2.336000e-12 | 191764 | ++++ | 9960099 | 0 | DGKB |
| 5626346 | 7:44174857_T_G | 7 | 44174857 | G | T | 0.5985 | -0.0640 | 0.0093 | 5.325000e-12 | 191764 | ---- | 9960099 | 3606 | MYL7 |
| 5732279 | 7:69406661_A_T | 7 | 69406661 | T | A | 0.1981 | -0.0900 | 0.0111 | 4.871000e-16 | 191764 | ---- | 9960099 | 0 | AUTS2 |
| 5965364 | 7:127253550_C_T | 7 | 127253550 | C | T | 0.9081 | 0.2761 | 0.0152 | 4.101000e-74 | 191764 | ++++ | 9960099 | 0 | PAX4 |
| 5976830 | 7:130025713_G_A | 7 | 130025713 | G | A | 0.9530 | -0.1365 | 0.0230 | 3.068000e-09 | 191764 | ---- | 9960099 | 0 | CPA1 |
| 6092347 | 7:157038803_A_G | 7 | 157038803 | G | A | 0.4626 | -0.0502 | 0.0088 | 1.127000e-08 | 191764 | ---- | 9960099 | 0 | UBE3C |
In [9]:
Copied!
mysumstats.plot_mqq(region=(7,156538803,157538803))
mysumstats.plot_mqq(region=(7,156538803,157538803))
Fri Nov 4 12:16:02 2022 Start to plot manhattan/qq plot with the following basic settings: Fri Nov 4 12:16:02 2022 -Genome-wide significance level is set to 5e-08 ... Fri Nov 4 12:16:02 2022 -Raw input contains 707780 variants... Fri Nov 4 12:16:02 2022 -Plot layout mode is : mqq Fri Nov 4 12:16:02 2022 -Region to plot : chr7:156538803-157538803. Fri Nov 4 12:16:02 2022 -Extract SNPs in region : chr7:156538803-157538803... Fri Nov 4 12:16:04 2022 -Extract SNPs in specified regions: 5831 Fri Nov 4 12:16:04 2022 Finished loading specified columns from the sumstats. Fri Nov 4 12:16:04 2022 Start conversion and sanity check: Fri Nov 4 12:16:04 2022 -Removed 0 variants with nan in CHR or POS column ... Fri Nov 4 12:16:04 2022 -Removed 0 variants with nan in P column ... Fri Nov 4 12:16:04 2022 -P values are being converted to -log10(P)... Fri Nov 4 12:16:04 2022 -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed... Fri Nov 4 12:16:04 2022 -Sanity check: 0 na/inf/-inf variants will be removed... Fri Nov 4 12:16:05 2022 -Maximum -log10(P) values is 7.948076083953893 . Fri Nov 4 12:16:05 2022 Finished data conversion and sanity check. Fri Nov 4 12:16:05 2022 Start to create manhattan plot with 5831 variants: Fri Nov 4 12:16:05 2022 -Found 1 significant variants with a sliding window size of 500 kb... Fri Nov 4 12:16:05 2022 Finished creating Manhattan plot successfully! Fri Nov 4 12:16:05 2022 -Skip annotating Fri Nov 4 12:16:05 2022 Start to create QQ plot with 5831 variants: Fri Nov 4 12:16:05 2022 -Calculating GC using P : 1.7191388184129959 Fri Nov 4 12:16:05 2022 Finished creating QQ plot successfully!
Out[9]:
(<Figure size 3000x1000 with 3 Axes>, <gwaslab.Log.Log at 0x7fcd214a0a30>)
In [10]:
Copied!
mysumstats.plot_mqq(mode="r", region=(7,156538803,157538803),region_grid=True, gtf_path="ensembl")
mysumstats.plot_mqq(mode="r", region=(7,156538803,157538803),region_grid=True, gtf_path="ensembl")
Fri Nov 4 12:16:06 2022 Start to plot manhattan/qq plot with the following basic settings: Fri Nov 4 12:16:06 2022 -Genome-wide significance level is set to 5e-08 ... Fri Nov 4 12:16:06 2022 -Raw input contains 707780 variants... Fri Nov 4 12:16:06 2022 -Plot layout mode is : r Fri Nov 4 12:16:06 2022 -Region to plot : chr7:156538803-157538803. Fri Nov 4 12:16:06 2022 -Extract SNPs in region : chr7:156538803-157538803... Fri Nov 4 12:16:08 2022 -Extract SNPs in specified regions: 5831 Fri Nov 4 12:16:08 2022 Finished loading specified columns from the sumstats. Fri Nov 4 12:16:08 2022 Start conversion and sanity check: Fri Nov 4 12:16:08 2022 -Removed 0 variants with nan in CHR or POS column ... Fri Nov 4 12:16:08 2022 -Removed 0 variants with nan in P column ... Fri Nov 4 12:16:08 2022 -P values are being converted to -log10(P)... Fri Nov 4 12:16:08 2022 -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed... Fri Nov 4 12:16:08 2022 -Sanity check: 0 na/inf/-inf variants will be removed... Fri Nov 4 12:16:09 2022 -Maximum -log10(P) values is 7.948076083953893 . Fri Nov 4 12:16:09 2022 Finished data conversion and sanity check. Fri Nov 4 12:16:09 2022 Start to create manhattan plot with 5831 variants: Fri Nov 4 12:16:09 2022 -Loading gtf files from:ensembl Fri Nov 4 12:16:27 2022 -plotting gene track.. Fri Nov 4 12:16:27 2022 -Finished plotting gene track.. Fri Nov 4 12:16:27 2022 -Found 1 significant variants with a sliding window size of 500 kb... Fri Nov 4 12:16:27 2022 Finished creating Manhattan plot successfully! Fri Nov 4 12:16:27 2022 -Skip annotating
Out[10]:
(<Figure size 3000x2000 with 3 Axes>, <gwaslab.Log.Log at 0x7fcd214a0a30>)
In [11]:
Copied!
lead_variants = mysumstats.get_lead()
lead_variants
lead_variants = mysumstats.get_lead()
lead_variants
Fri Nov 4 12:16:28 2022 Start to extract lead variants... Fri Nov 4 12:16:28 2022 -Processing 707780 variants... Fri Nov 4 12:16:28 2022 -Significance threshold : 5e-08 Fri Nov 4 12:16:28 2022 -Sliding window size: 500 kb Fri Nov 4 12:16:29 2022 -Found 1077 significant variants in total... Fri Nov 4 12:16:29 2022 -Identified 7 lead variants! Fri Nov 4 12:16:29 2022 Finished extracting lead variants successfully!
Out[11]:
| SNPID | CHR | POS | EA | NEA | EAF | BETA | SE | P | N | DIRECTION | STATUS | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5474489 | 7:13888699_G_C | 7 | 13888699 | G | C | 0.5680 | 0.0562 | 0.0094 | 2.507000e-09 | 191764 | ++++ | 9960099 |
| 5480067 | 7:14898282_C_T | 7 | 14898282 | C | T | 0.6012 | 0.0617 | 0.0088 | 2.336000e-12 | 191764 | ++++ | 9960099 |
| 5626346 | 7:44174857_T_G | 7 | 44174857 | G | T | 0.5985 | -0.0640 | 0.0093 | 5.325000e-12 | 191764 | ---- | 9960099 |
| 5732279 | 7:69406661_A_T | 7 | 69406661 | T | A | 0.1981 | -0.0900 | 0.0111 | 4.871000e-16 | 191764 | ---- | 9960099 |
| 5965364 | 7:127253550_C_T | 7 | 127253550 | C | T | 0.9081 | 0.2761 | 0.0152 | 4.101000e-74 | 191764 | ++++ | 9960099 |
| 5976830 | 7:130025713_G_A | 7 | 130025713 | G | A | 0.9530 | -0.1365 | 0.0230 | 3.068000e-09 | 191764 | ---- | 9960099 |
| 6092347 | 7:157038803_A_G | 7 | 157038803 | G | A | 0.4626 | -0.0502 | 0.0088 | 1.127000e-08 | 191764 | ---- | 9960099 |
In [8]:
Copied!
flank = 100000
for index,row in lead_variants.iterrows():
mysumstats.plot_mqq(mode="r",region=(row["CHR"],row["POS"]-flank,row["POS"]+flank),region_grid=True,mqqratio=2,taf=[4,0,0.95,1,1],figargs={"figsize":(15,5),"dpi":300})
#vcf_path="/home/yunye/mydata/d_disk/eas_1kg_af/EAS.chr7.split_norm_af.vcf.gz")
flank = 100000
for index,row in lead_variants.iterrows():
mysumstats.plot_mqq(mode="r",region=(row["CHR"],row["POS"]-flank,row["POS"]+flank),region_grid=True,mqqratio=2,taf=[4,0,0.95,1,1],figargs={"figsize":(15,5),"dpi":300})
#vcf_path="/home/yunye/mydata/d_disk/eas_1kg_af/EAS.chr7.split_norm_af.vcf.gz")
Fri Oct 21 01:20:36 2022 Start to plot manhattan/qq plot with the following basic settings: Fri Oct 21 01:20:36 2022 -Genome-wide significance level is set to 5e-08 ... Fri Oct 21 01:20:36 2022 -Raw input contains 707780 variants... Fri Oct 21 01:20:36 2022 -Plot layout mode is : r Fri Oct 21 01:20:36 2022 -Region to plot : chr7:13788699-13988699. Fri Oct 21 01:20:36 2022 -Extract SNPs in region : chr7:13788699-13988699... Fri Oct 21 01:20:37 2022 -Extract SNPs in specified regions: 1225 Fri Oct 21 01:20:37 2022 Finished loading specified columns from the sumstats. Fri Oct 21 01:20:37 2022 Start conversion and sanity check: Fri Oct 21 01:20:37 2022 -Removed 0 variants with nan in P column ... Fri Oct 21 01:20:37 2022 -P values are being converted to -log10(P)... Fri Oct 21 01:20:37 2022 -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed... Fri Oct 21 01:20:37 2022 -Sanity check: 0 na/inf/-inf variants will be removed... Fri Oct 21 01:20:37 2022 -Maximum -log10(P) values is 8.600845666041783 . Fri Oct 21 01:20:37 2022 Finished data conversion and sanity check. Fri Oct 21 01:20:37 2022 Start to create manhattan plot with 1224 variants: Fri Oct 21 01:20:37 2022 -Loading gtf files from:defualt Fri Oct 21 01:20:46 2022 -plotting gene track.. Fri Oct 21 01:20:46 2022 -Finished plotting gene track.. Fri Oct 21 01:20:46 2022 -Found 1 significant variants with a sliding window size of 500 kb... Fri Oct 21 01:20:46 2022 Finished creating Manhattan plot successfully! Fri Oct 21 01:20:46 2022 -Skip annotating Fri Oct 21 01:20:46 2022 Start to plot manhattan/qq plot with the following basic settings: Fri Oct 21 01:20:46 2022 -Genome-wide significance level is set to 5e-08 ... Fri Oct 21 01:20:46 2022 -Raw input contains 707780 variants... Fri Oct 21 01:20:46 2022 -Plot layout mode is : r Fri Oct 21 01:20:46 2022 -Region to plot : chr7:14798282-14998282. Fri Oct 21 01:20:46 2022 -Extract SNPs in region : chr7:14798282-14998282... Fri Oct 21 01:20:47 2022 -Extract SNPs in specified regions: 993 Fri Oct 21 01:20:47 2022 Finished loading specified columns from the sumstats. Fri Oct 21 01:20:47 2022 Start conversion and sanity check: Fri Oct 21 01:20:47 2022 -Removed 0 variants with nan in P column ... Fri Oct 21 01:20:47 2022 -P values are being converted to -log10(P)... Fri Oct 21 01:20:47 2022 -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed... Fri Oct 21 01:20:47 2022 -Sanity check: 0 na/inf/-inf variants will be removed... Fri Oct 21 01:20:47 2022 -Maximum -log10(P) values is 11.631527161559639 . Fri Oct 21 01:20:47 2022 Finished data conversion and sanity check. Fri Oct 21 01:20:47 2022 Start to create manhattan plot with 993 variants: Fri Oct 21 01:20:47 2022 -Loading gtf files from:defualt Fri Oct 21 01:20:56 2022 -plotting gene track.. Fri Oct 21 01:20:56 2022 -Finished plotting gene track.. Fri Oct 21 01:20:56 2022 -Found 1 significant variants with a sliding window size of 500 kb... Fri Oct 21 01:20:56 2022 Finished creating Manhattan plot successfully! Fri Oct 21 01:20:56 2022 -Skip annotating Fri Oct 21 01:20:56 2022 Start to plot manhattan/qq plot with the following basic settings: Fri Oct 21 01:20:56 2022 -Genome-wide significance level is set to 5e-08 ... Fri Oct 21 01:20:56 2022 -Raw input contains 707780 variants... Fri Oct 21 01:20:56 2022 -Plot layout mode is : r Fri Oct 21 01:20:56 2022 -Region to plot : chr7:44074857-44274857. Fri Oct 21 01:20:56 2022 -Extract SNPs in region : chr7:44074857-44274857... Fri Oct 21 01:20:58 2022 -Extract SNPs in specified regions: 929 Fri Oct 21 01:20:58 2022 Finished loading specified columns from the sumstats. Fri Oct 21 01:20:58 2022 Start conversion and sanity check: Fri Oct 21 01:20:58 2022 -Removed 0 variants with nan in P column ... Fri Oct 21 01:20:58 2022 -P values are being converted to -log10(P)... Fri Oct 21 01:20:58 2022 -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed... Fri Oct 21 01:20:58 2022 -Sanity check: 0 na/inf/-inf variants will be removed... Fri Oct 21 01:20:58 2022 -Maximum -log10(P) values is 11.273680387889225 . Fri Oct 21 01:20:58 2022 Finished data conversion and sanity check. Fri Oct 21 01:20:58 2022 Start to create manhattan plot with 929 variants: Fri Oct 21 01:20:58 2022 -Loading gtf files from:defualt Fri Oct 21 01:21:07 2022 -plotting gene track.. Fri Oct 21 01:21:07 2022 -Finished plotting gene track.. Fri Oct 21 01:21:07 2022 -Found 1 significant variants with a sliding window size of 500 kb... Fri Oct 21 01:21:07 2022 Finished creating Manhattan plot successfully! Fri Oct 21 01:21:07 2022 -Skip annotating Fri Oct 21 01:21:07 2022 Start to plot manhattan/qq plot with the following basic settings: Fri Oct 21 01:21:07 2022 -Genome-wide significance level is set to 5e-08 ... Fri Oct 21 01:21:07 2022 -Raw input contains 707780 variants... Fri Oct 21 01:21:07 2022 -Plot layout mode is : r Fri Oct 21 01:21:07 2022 -Region to plot : chr7:69306661-69506661. Fri Oct 21 01:21:07 2022 -Extract SNPs in region : chr7:69306661-69506661... Fri Oct 21 01:21:09 2022 -Extract SNPs in specified regions: 501 Fri Oct 21 01:21:09 2022 Finished loading specified columns from the sumstats. Fri Oct 21 01:21:09 2022 Start conversion and sanity check: Fri Oct 21 01:21:09 2022 -Removed 0 variants with nan in P column ... Fri Oct 21 01:21:09 2022 -P values are being converted to -log10(P)... Fri Oct 21 01:21:09 2022 -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed... Fri Oct 21 01:21:09 2022 -Sanity check: 0 na/inf/-inf variants will be removed... Fri Oct 21 01:21:09 2022 -Maximum -log10(P) values is 15.31238187042823 . Fri Oct 21 01:21:09 2022 Finished data conversion and sanity check. Fri Oct 21 01:21:09 2022 Start to create manhattan plot with 501 variants: Fri Oct 21 01:21:09 2022 -Loading gtf files from:defualt Fri Oct 21 01:21:18 2022 -plotting gene track.. Fri Oct 21 01:21:18 2022 -Finished plotting gene track.. Fri Oct 21 01:21:18 2022 -Found 1 significant variants with a sliding window size of 500 kb... Fri Oct 21 01:21:18 2022 Finished creating Manhattan plot successfully! Fri Oct 21 01:21:18 2022 -Skip annotating Fri Oct 21 01:21:18 2022 Start to plot manhattan/qq plot with the following basic settings: Fri Oct 21 01:21:18 2022 -Genome-wide significance level is set to 5e-08 ... Fri Oct 21 01:21:18 2022 -Raw input contains 707780 variants... Fri Oct 21 01:21:18 2022 -Plot layout mode is : r Fri Oct 21 01:21:18 2022 -Region to plot : chr7:127153550-127353550. Fri Oct 21 01:21:18 2022 -Extract SNPs in region : chr7:127153550-127353550... Fri Oct 21 01:21:19 2022 -Extract SNPs in specified regions: 713 Fri Oct 21 01:21:19 2022 Finished loading specified columns from the sumstats. Fri Oct 21 01:21:19 2022 Start conversion and sanity check: Fri Oct 21 01:21:19 2022 -Removed 0 variants with nan in P column ... Fri Oct 21 01:21:19 2022 -P values are being converted to -log10(P)... Fri Oct 21 01:21:19 2022 -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed... Fri Oct 21 01:21:19 2022 -Sanity check: 0 na/inf/-inf variants will be removed... Fri Oct 21 01:21:19 2022 -Maximum -log10(P) values is 73.38711023071251 . Fri Oct 21 01:21:19 2022 Finished data conversion and sanity check. Fri Oct 21 01:21:19 2022 Start to create manhattan plot with 713 variants: Fri Oct 21 01:21:19 2022 -Loading gtf files from:defualt Fri Oct 21 01:21:28 2022 -plotting gene track.. Fri Oct 21 01:21:28 2022 -Finished plotting gene track.. Fri Oct 21 01:21:28 2022 -Found 1 significant variants with a sliding window size of 500 kb... Fri Oct 21 01:21:28 2022 Finished creating Manhattan plot successfully! Fri Oct 21 01:21:28 2022 -Skip annotating Fri Oct 21 01:21:28 2022 Start to plot manhattan/qq plot with the following basic settings: Fri Oct 21 01:21:28 2022 -Genome-wide significance level is set to 5e-08 ... Fri Oct 21 01:21:28 2022 -Raw input contains 707780 variants... Fri Oct 21 01:21:28 2022 -Plot layout mode is : r Fri Oct 21 01:21:28 2022 -Region to plot : chr7:129925713-130125713. Fri Oct 21 01:21:28 2022 -Extract SNPs in region : chr7:129925713-130125713... Fri Oct 21 01:21:30 2022 -Extract SNPs in specified regions: 808 Fri Oct 21 01:21:30 2022 Finished loading specified columns from the sumstats. Fri Oct 21 01:21:30 2022 Start conversion and sanity check: Fri Oct 21 01:21:30 2022 -Removed 0 variants with nan in P column ... Fri Oct 21 01:21:30 2022 -P values are being converted to -log10(P)... Fri Oct 21 01:21:30 2022 -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed... Fri Oct 21 01:21:30 2022 -Sanity check: 0 na/inf/-inf variants will be removed... Fri Oct 21 01:21:30 2022 -Maximum -log10(P) values is 8.513144644723056 . Fri Oct 21 01:21:30 2022 Finished data conversion and sanity check. Fri Oct 21 01:21:30 2022 Start to create manhattan plot with 808 variants: Fri Oct 21 01:21:30 2022 -Loading gtf files from:defualt Fri Oct 21 01:21:39 2022 -plotting gene track.. Fri Oct 21 01:21:39 2022 -Finished plotting gene track.. Fri Oct 21 01:21:39 2022 -Found 1 significant variants with a sliding window size of 500 kb... Fri Oct 21 01:21:39 2022 Finished creating Manhattan plot successfully! Fri Oct 21 01:21:39 2022 -Skip annotating Fri Oct 21 01:21:39 2022 Start to plot manhattan/qq plot with the following basic settings: Fri Oct 21 01:21:39 2022 -Genome-wide significance level is set to 5e-08 ... Fri Oct 21 01:21:39 2022 -Raw input contains 707780 variants... Fri Oct 21 01:21:39 2022 -Plot layout mode is : r Fri Oct 21 01:21:39 2022 -Region to plot : chr7:156938803-157138803. Fri Oct 21 01:21:39 2022 -Extract SNPs in region : chr7:156938803-157138803... Fri Oct 21 01:21:40 2022 -Extract SNPs in specified regions: 1121 Fri Oct 21 01:21:40 2022 Finished loading specified columns from the sumstats. Fri Oct 21 01:21:40 2022 Start conversion and sanity check: Fri Oct 21 01:21:40 2022 -Removed 0 variants with nan in P column ... Fri Oct 21 01:21:40 2022 -P values are being converted to -log10(P)... Fri Oct 21 01:21:40 2022 -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed... Fri Oct 21 01:21:40 2022 -Sanity check: 0 na/inf/-inf variants will be removed... Fri Oct 21 01:21:40 2022 -Maximum -log10(P) values is 7.948076083953893 . Fri Oct 21 01:21:40 2022 Finished data conversion and sanity check. Fri Oct 21 01:21:40 2022 Start to create manhattan plot with 1121 variants: Fri Oct 21 01:21:41 2022 -Loading gtf files from:defualt Fri Oct 21 01:21:50 2022 -plotting gene track.. Fri Oct 21 01:21:50 2022 -Finished plotting gene track.. Fri Oct 21 01:21:50 2022 -Found 1 significant variants with a sliding window size of 500 kb... Fri Oct 21 01:21:50 2022 Finished creating Manhattan plot successfully! Fri Oct 21 01:21:50 2022 -Skip annotating
check my available reference data¶
In [12]:
Copied!
gl.check_downloaded_ref()
gl.check_downloaded_ref()
Fri Nov 4 12:16:29 2022 Start to check downloaded reference files... Fri Nov 4 12:16:29 2022 1kg_eas_hg19 : /Users/he/work/gwaslab/src/gwaslab/data/EAS.ALL.split_norm_af.1kgp3v5.hg19.vcf.gz Fri Nov 4 12:16:29 2022 1kg_eas_hg19_tbi : /Users/he/work/gwaslab/src/gwaslab/data/EAS.ALL.split_norm_af.1kgp3v5.hg19.vcf.gz.tbi Fri Nov 4 12:16:29 2022 testlink : /Users/he/work/gwaslab/src/gwaslab/data/EAS.chr22.split_norm_af.1kgp3v5.vcf.gz
Out[12]:
{'1kg_eas_hg19': '/Users/he/work/gwaslab/src/gwaslab/data/EAS.ALL.split_norm_af.1kgp3v5.hg19.vcf.gz',
'1kg_eas_hg19_tbi': '/Users/he/work/gwaslab/src/gwaslab/data/EAS.ALL.split_norm_af.1kgp3v5.hg19.vcf.gz.tbi',
'testlink': '/Users/he/work/gwaslab/src/gwaslab/data/EAS.chr22.split_norm_af.1kgp3v5.vcf.gz'}
In [18]:
Copied!
gl.get_path("1kg_eas_hg19")
gl.get_path("1kg_eas_hg19")
Out[18]:
'/Users/he/work/gwaslab/src/gwaslab/data/EAS.ALL.split_norm_af.1kgp3v5.hg19.vcf.gz'
In [17]:
Copied!
mysumstats.plot_mqq(mode="r",region=(7,156538803,157538803),region_grid=True,
vcf_path=gl.get_path("1kg_eas_hg19"),taf=[4,0,0.95,1,1])
mysumstats.plot_mqq(mode="r",region=(7,156538803,157538803),region_grid=True,
vcf_path=gl.get_path("1kg_eas_hg19"),taf=[4,0,0.95,1,1])
Fri Nov 4 12:32:36 2022 Start to plot manhattan/qq plot with the following basic settings: Fri Nov 4 12:32:36 2022 -Genome-wide significance level is set to 5e-08 ... Fri Nov 4 12:32:36 2022 -Raw input contains 707780 variants... Fri Nov 4 12:32:36 2022 -Plot layout mode is : r Fri Nov 4 12:32:36 2022 -Region to plot : chr7:156538803-157538803. Fri Nov 4 12:32:36 2022 -Extract SNPs in region : chr7:156538803-157538803... Fri Nov 4 12:32:38 2022 -Extract SNPs in specified regions: 5831 Fri Nov 4 12:32:38 2022 Finished loading specified columns from the sumstats. Fri Nov 4 12:32:38 2022 Start conversion and sanity check: Fri Nov 4 12:32:38 2022 -Removed 0 variants with nan in CHR or POS column ... Fri Nov 4 12:32:38 2022 -Removed 0 variants with nan in P column ... Fri Nov 4 12:32:38 2022 -P values are being converted to -log10(P)... Fri Nov 4 12:32:38 2022 -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed... Fri Nov 4 12:32:38 2022 -Sanity check: 0 na/inf/-inf variants will be removed... Fri Nov 4 12:32:38 2022 -Maximum -log10(P) values is 7.948076083953893 . Fri Nov 4 12:32:38 2022 Finished data conversion and sanity check. Fri Nov 4 12:32:38 2022 Start to load reference genotype... Fri Nov 4 12:32:38 2022 -reference vcf path : /Users/he/work/gwaslab/src/gwaslab/data/EAS.ALL.split_norm_af.1kgp3v5.hg19.vcf.gz Fri Nov 4 12:32:40 2022 -Retrieving index... 16384 Fri Nov 4 12:32:40 2022 -Calculating Rsq... Fri Nov 4 12:32:40 2022 Finished loading reference genotype successfully! Fri Nov 4 12:32:40 2022 Start to create manhattan plot with 5831 variants: Fri Nov 4 12:32:41 2022 -Loading gtf files from:defualt Fri Nov 4 12:32:52 2022 -plotting gene track.. Fri Nov 4 12:32:53 2022 -Finished plotting gene track.. Fri Nov 4 12:32:53 2022 -Found 1 significant variants with a sliding window size of 500 kb... Fri Nov 4 12:32:53 2022 Finished creating Manhattan plot successfully! Fri Nov 4 12:32:53 2022 -Skip annotating
Out[17]:
(<Figure size 3000x2000 with 4 Axes>, <gwaslab.Log.Log at 0x7fcd214a0a30>)
In [ ]:
Copied!